B5 

(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 
International Bureau 

(43) International Publication Date 
23 August 2001 (23.08.2001) 




PCT 



iiiiiiniiiiiiBniiiiniiiiiuuinii 

(10) International Publication Number 

WO 01/61036 A2 



(51) International Patent Classification 7 : 
(21) International Application Number: PCT/GB01/00718 



C12Q 1/68 (71) Applicant (for MW only): TOWLER, Philip, Dean 
IGB/GB]; Frank B, Dehn & Co., 179 Queen Victoria 
Street, London EC4V 4EL (GB). 



(22) International FilingDate: 19 February 2001 (19.02.2001) < 72 > Inventor; and 

1 ' 6 J (75) Inventor/Applicant (for US only): LEXOW, Preben 

[NO/NO]; Bloksbergveien 16, N-3132 Husoysund (NO). 



(25) Filing Language: 

(26) Publication Language: 



English 
English 



(30) Priority Data: 
20000792 
20012864 
20012863 



1 7 February 2000 (1 7.02.2000) NO 
21 February 2000 (21 .02.2000) NO 
27 February 2000 (27.02.2000) NO 



(71) Applicant (for all designated States except US): COM- 
PLETE GENOMICS AS [NO/NO]; P.O. Box 64, Blin- 
dem, N-0313Oslo (NO). 



(74) Agents: TOWLER, Philip, Dean el ah; Frank B. Dehn & 
Co., 179 Queen Victoria Street, London EC4V 4EL (GB). 

(81) Designated States (national): AE, AG, AL. AM, AT, AT 
(utility model), AU, AZ, BA, BB, BG, BR, BY, BZ, CA, 
CH. CN, CR, CU, CZ, CZ (utility model), DE, DE (utility 
model). DK. DK (utility model), DM, DZ, EE, EE (utility 
model), ES. FI, FI (utility model), GB, GD, GE, GH, GM, 
HR. HU, ID, 1L, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK, 
LR, LS, LT. LU. LV, MA, MD, MG, MK, MN, MW. MX, 
MZ, NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SK 

[Continued on next page] 



(54) Title: A METHOD OF MAPPING RESTRICTION ENDONUCLEASE CLEAVAGE SITES 



< 




oo. 



L**t»Wh«»Z56 



(57) Abstract: The invention provides a 
two-step sorting procedure where it is pos- 
sible to scan the overhanging single-stranded 
ends of nucleic acid fragments quickly and 
efficiently using solid supports, such as 
microarrays. Use is made of two different 
sets of degenerate overhang- adaptors in 
this regard. The invention also provides 
new methods and strategies inter alia for 
collecting information about sequences and 
cleavage sites that are between the cleavage 
sites that have generated an overhang pair. 
An effective method of producing the 
restriction map, making it easier to create 
multiple maps, is also described. 



O 



0161036A2 I > 



WO 01/61036 A2 



4 
> 



(utility model), SL, TJ, TM, TR, TT, TZ, UA, UG, US, UZ, 
VN, YU, ZA, ZW. 

(84) Designated States (regional): ARIPO patent (GH, GM, 
KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZW), Eurasian 
patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European 
patent (AT, BE. CH, CY, DE, DK, ES, Fl, FR, GB, GR, IE, 
IT, LU, MC, NL, PT, SE, TR), OAPI patent (BF, BJ, CF, 
CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG). 



Published: 

— without international search report and to be republished 
upon receipt of that report 



For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" « ppearing at the begin- 
ning of each regular issue of the PCT GaLZtte. 



BNSDOCID: <WO 0161036A2J_> 



WO 01/61036 



PCT/GB01/00718 



- 1 - 



A Method of Mapping Restriction Endonuclease Cleavage Sites 

5 The present invention relates to new methods of method of mapping 

restriction endonuclease cleavage sites. 

Traditionally, DNA molecules have been mapped using Type II 
restriction endonucleases such as EcoRI and Hindlll which have well-defined 
recognition and cleavage sites. After cleavage with the restriction 
10 endonucleases, the DNA fragments are generally run on an agarose gel 

together with DNA markers of known size and visualised using EtBr under 
UV light. 

More recently, use has been made of Type lis restriction 
endonucleases which have cleavage sites outside their recognition sites. 
15 Reference is made in this regard to US 5,858,656 and Gene, 145 (1994) 
163-169. 

However, there remains a need to provide effective methods for 
determining the sequence of the single-strand overhangs that are created with 
Type lis restriction endonucleases. The invention therefore provides a 

20 two-step sorting procedure where it is possible to scan the overhangs quickly 
and efficiently using solid supports such as microarrays. Furthermore, the 
invention provides new methods and strategies inter alia for collecting 
information about sequences and cleavage sites that are between the cleavage 
sites that have generated an overhang pair. An effective method of producing 

25 the restriction map, making it easier to create multiple maps, is also 
described. 

The invention therefore provides a method of mapping a target nucleic 
acid molecule, the method comprising the steps of: 

3 0 (a) treating the target nucleic acid molecule with one or more restriction 
endonucleases to produce one or more nucleic acid fragments having 
first and second 5 ! - or 3 - single-stranded overhanging ends, 

(b) adding the nucleic acid fragments to a first set of overhang-adaptors, 

35 

each overhang-adaptor of the first set comprising a nucleic acid 
molecule comprising at least one 5 - or 3 f -single-stranded end, 
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the single-stranded ends of the overhang-adaptors being of lengths and 
orientations (i.e. 5 - or 3'-) corresponding to the lengths and 
orientations of the overhanging single-strands of the cleavage sites of 
the said restriction endonucleases, 

wherein said first set comprises a collection of overhang-adaptors 
whose single-stranded ends collectively encode up to all possible 
permutations and combinations of the nucleotides A, C, G and T, 



10 and wherein each overhang-adaptor in the said first set is spatially 

separable from every other different overhang-adaptor in the first set; 

(c) contacting the said nucleic acid fragments with a nucleic acid ligase to 
cause selective ligation of the nucleic acid fragments with those 
15 overhang-adaptors whose 5 - or 3 - single-stranded ends are fully 

complementary to the 5 - or 3-overhanging single-stranded ends of the 
nucleic acid fragments. 



thus forming a plurality of separable populations of nucleic acid 
20 fragments which are ligated at their first ends to a first overhang- 

adaptor; 

optionally, removing the unligated nucleic acid fragments; 

25 (d) identifying the sequence of the second overhanging single-stranded end 
of the nucleic acid fragments; and 

(e) comparing the sequences of the ends of the nucleic acid fragments in 
order to produce a map of the target nucleic acid molecule. 

30 

The invention also provides methods for identifying the overhanging 
ends of a nucleic acid fragment comprising the steps (b)-(d) as described 
above. 

As used herein, the term "mapping a target nucleic acid molecule" 
3 5 means providing information on the order of some or all of the fragments into 
which the target nucleic acid molecule may be divided or on the position of 
discrete sequences, e.g. restriction endonuclease cleavage sites, within the 
target nucleic acid molecule. The mapping of a target nucleic acid molecule 
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will often facilitate its subsequent sequencing. 

As used herein the term "target nucleic acid molecule" refers to any 
nucleic acid molecule, for example a naturally occurring, synthetic or 
recombinant polynucleotide molecule. The term includes DNA, such as 
5 genomic, cDNA or vector DNA; RNA, such as mRNA; and PNA and their 
analogues. Generally, the term relates to a double-stranded nucleic acid 
molecule, most preferably a DNA molecule. 

The target nucleic acid molecule is treated, i.e. digested, cleaved or 
cut, with one or more restriction endonucleases in order to divide up the 
10 target nucleic acid molecule into one or more nucleic acid fragments. Each of 
these fragments will have two ends, i.e. the first and second ends, having 
overhanging, i.e. single-stranded, stretches of nucleotides. 

The invention particularly relates to the use of restriction 
endonucleases which cleave DNA to produce overhanging ends which are 
15 non-identical in sequence and/or have overhanging sequences which are 
unrelated to the recognition sequence of the restriction enzyme used. 
Preferably, the restriction endonuclease is a Type Ip or Type lis restriction 
endonuclease. 

Type Ip restriction endonucleases generate degenerate overhangs in 

20 the middle of their recognition sequences. 

Type lis restriction endonucleases interact with two discrete sites on 
double-stranded DNA: the recognition site which is 4-7bp (bp=base pairs) 
long and the cleavage site which is usually l-20bp away from the recognition 
site. Overhangs of -6 to +5 nucleotides are usually produced. These 

25 endonucleases exhibit no specificity to the sequence that is cut and they can 
therefore generate overhangs with all types of nucleotide compositions. 

Over 70 classes of Type lis restriction endonucleases have been 
identified and there are large variations both with respect to substrate 
specificity and cleavage pattern. In addition, these enzymes have proved to be 

3 0 well suited to "module swapping" experiments so that one can create new 

enzymes for particular requirements (Huang-B, et aLj J-Protein-Chem. 1996, 
15(5):481-9, Bickle, T.A.; 1993 in Nucleases (2nd edn), Kim-YG et al.j 
PNAS 1994, 91:883-887). Very many combinations and variants of these 
enzymes can therefore be used according to the principles described herein. 

3 5 Examples of Type lis restriction endonucleases which may be used in 

this regard include Bbv I, Bce83 I, Beef I, Bmp I, Bsg I, BspLUl 1 III, Bst7 1 
I, Eco57 I, Fok I, Gsu I, Hga I, Mme I and the like. 

Preferably, Type lis restriction endonucleases are used which produce 



BNSDOCID; <WO. 



_0161036A2J_> 



WO 01/61036 



r 

PCT/GB01/00718 



overhangs of 3-5 nucleotides, preferably 3 or 4 nucleotides, either at the 5- 
end or the 3 f -end of the nucleic acid fragment. 

Particularly preferred restriction endonucleases are AlwNI, BslI, 
Dralll, PflMI, BstXI, BpII, Bael, Earl, Sapl, Bbsl, Bbvl, Bsal, Fokl, SfaNI 
5 and Hgal. 

In one preferred embodiment of the invention, combinations of Type 
lis restriction endonucleases are used which either all produce 5-overhangs or 
all produce 3 f -overhangs. This obviates the need for sets of overhang- 
adaptors with both 5 - and 3 -single-stranded ends. 
10 Alternatively, the restriction endonuclease is one with an interrupted 

palindromic recognition sequence which cuts at sites which are independent 
of the intervening sequences, provided that the intervening sequence is of the 
appropriate length. 

In the context of this invention, any reference to a Type Us restriction 
15 endonuclease should also be considered to be a reference to a Type Ip 
restriction endonuclease. 

In a preferred embodiment of the invention, the target nucleic acid 
molecule is treated with only one restriction endonuclease. In this case, the 
restriction endonuclease is preferably a Type Ip or lis restriction 

2 0 endonuclease. 

In another preferred embodiment of the invention, the target nucleic 
acid molecule is treated with more than one restriction endonuclease, wherein 
the restriction endonucleases either all produce 5-overhanging ends or all 
produce 3-overhanging ends. 

25 The digested nucleic acid fragments are then added to a first set of 

overhang-adaptors. 

In the context of the present invention, the term "overhang-adaptor" 
refers to a structure comprising a nucleic acid molecule comprising, i.e. 
consisting at least of, a 5 - or 3 - single-stranded nucleic acid end. 

30 The essential feature of each of the overhang-adaptors is that they 

possess at least one free 5 - or one free 3 ! - single-stranded nucleic acid end. 
The remaining part(s) of the overhang adaptor should allow the binding of 
the single-stranded end of the overhang-adaptor to a single-stranded end of 
the nucleic acid fragments. For example, the remaining part of the overhang- 

3 5 adaptor may be a single-stranded or double-stranded nucleic acid molecule, 

preferably a DNA molecule. Most preferably, the overhang-adaptor is a 
single-stranded DNA molecule or oligonucleotide. 

In this context, the term "single-stranded ends" of the overhang- 
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adaptors refers to that part of the overhang-adaptor which might be 
complementary to a single-stranded overhang of the nucleic acid fragments. 

Thus it can be seen that the end of the overhang-adaptor which binds 
to the nucleic acid fragment may be single-stranded DNA and also the 
5 remaining part of the overhang-adaptor may be single-stranded DNA. The 
single-stranded DNA may, for example, be an oligonucleotide of total length 
10-50 nucleotides, preferably 12-30 nucleotides, and most preferably 13-20 
nucleotides. In some embodiments of the invention, overhang-adaptors 
which are double-stranded DNA molecules having single-stranded 5'- or 3'- 
10 overhangs are excluded. 

The single-stranded ends of the overhang-adaptors are of lengths and 
orientations which correspond to the lengths and orientations of the 
overhanging single-strands of the cleavage sites of the restriction 
endonucleases used. 

15 The lengths and orientations of the cleavage sites of the restriction 

endonucleases will be known in each case; In this context, the term 
"orientation" merely refers to whether the single-stranded overhang produced 
by cleavage with the restriction endonuclease is a 5-overhang or a 3- 
overhang. 

20 It will be appreciated that the single-stranded ends of the nucleic acid 

fragments and overhang-adaptors must be generally complementary in form, 
i.e. where the nucleic acid fragments all have 5 - single-stranded overhangs, 
the single-stranded ends of the overhang-adaptors (both the first and second 
sets) will all be 5 f - to allow binding thereto; and where the nucleic acid 

25 fragments all have 3 - single-stranded overhangs, the single-stranded ends of 
the overhang-adaptors (both the first and second sets) will all be 3 - to allow 
binding thereto. Where the nucleic acid fragments have combinations of 5- 
and 3-single-stranded overhangs, then the sets of the overhang-adaptors must 
also contain adaptors having 5'- and 3-single-stranded ends. 

3 0 In one embodiment of the invention, the overhang-adaptors are single- 

stranded DNA molecules which have mirror-image sequences at each end (for 

example, 5-CATC — : GTAG-3'). In between the sequences is a stretch 

of DNA or other structure which allows the overhang-adaptor to form a loop. 
The overhang-adaptor is then bound to a solid support, if necessary, in the 

35 region between the two end sequences. In this way, overhang-adaptors at any 
one spatial location or address will bind to the same specific single-stranded 
sequence whether that sequence is a 5-single-stranded sequence or a 3- 
single-stranded sequence. 
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With regard to any one restriction endonuclease that is used, it will be 
appreciated that the single-stranded ends of the nucleic acid fragments and 
the single-stranded ends of the overhang-adaptors must be generally of the 
same length. Thus for example, where the nucleic acid fragments all have 5- 
5 single-stranded overhangs of length n, the single-stranded nucleic acids of the 
overhang-adaptors (both the first and second sets) will all be 5 - single- 
stranded overhangs of length n to allow binding thereto; and where the 
nucleic acid fragments all have 3 - single-stranded overhangs of length n, the 
single-stranded nucleic acids of the overhang-adaptors (both the first and 
1 0 second sets) will all be 3 - single-stranded overhangs of length n to allow 
binding thereto. 

However, if the chosen combination of restriction endonucleases 
produces overhangs of different lengths, then the set of overhang adaptors will 
need to comprise single-stranded ends which are capable of binding to each of 
15 these different length overhangs. It will be appreciated, however, that if 

adaptors are used having single-stranded ends of a length that corresponds to 
the longest of the overhangs produced by the chosen restriction 
endonucleases, then the ends of such adaptors should also be capable of 
binding the shorter overhangs. Under such circumstances, a modification of 

2 0 the method used to identify the nucleic acid fragments which have been 

ligated to the second overhang-adaptor might be required. 

The first set comprises a collection of overhang-adaptors whose 5- 
and/or 3 - single-stranded ends collectively encode up to all possible 
permutations and combinations of the nucleotides A, C, G and T, i.e. the 
25 single-stranded ends comprise a set of degenerate sequences of nucleotides 
corresponding to the length and orientation of the overhanging ends of the 
nucleic acid fragments. Thus within the set of overhang-adaptors, there will 
be individual overhang-adaptors that are capable of hybridising and ligating to 
each of the individual first ends of the nucleic acid fragments. In some 

3 0 embodiments, universal nucleotides may be used at one or more of the 

positions in the single-stranded ends of the overhang adaptors. 

For example, if the length of the overhang produced by the restriction 
endonuclease is 4, then the first set of overhang-adaptors will comprise 
AAAA, AAAC, AAAG, AAAT, AACA, AACC, etc.. In general, where n is 
3 5 the length of the overhang, the first set of overhang adaptors will consist of all 
or essentially all of 4 n adaptors. Thus where n=4, a set of 256 overhang- 
adaptors will be used. 

If combinations of restriction endonucleases are used, all of which 
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produce overhangs of the same orientation and of the same length n, then 
generally a set of 4 n overhang-adaptors will be required. However, if 
combinations of restriction endonucleases are used all of which produce 
overhangs of the same length n but with different orientations, then generally 
a set of 2 x 4 n adaptors will be required. If restriction endonucleases are used 
which produce overhangs of different lengths, then the same principles apply, 
mutatis mutandis. 

If desired, one or more of A, C, G or T may be replaced by an 
alternative nucleotide, i.e. U for T, or I. In particular, universal nucleotides 
which bind to A, C, G and T may be used in one or more positions in the 
overhang-adaptors. 

It should be noted that the number of adaptors required may be 
reduced if not all of the nucleotides in an overhang are read. Thus it is 
possible to read only 3 out of 5 nucleotides in an overhang, thus reducing the 
number of required adaptors.from 1024 to 64. In such a case, universal 
nucleotides which bind to A, C, G or T may be used in the adaptors. 

For the purposes of the invention, each overhang-adaptor in the said 
first set will be spatially separable or spatially separate from every other 
different overhang-adaptor in the first set; and the spatial position or address 
of each overhang-adaptor and the sequence of its single-stranded end will be 
known. Thus, for example, it will be possible to distinguish between 
overhang-adaptors having AAAA single-stranded ends from overhang- 
adaptors having AAAC or AAAG single-stranded ends. In this context, 
therefore, the term "spatially separable" is intended to mean that the different 
overhang-adaptors might be spatially separated or physically separated from 
one another, for example, in separate compartments or wells, or attached to 
distinct or defined areas of a solid support, such as a microarray. In one 
embodiment of the invention, samples of each of the different overhang- 
adaptors of the first set are transferred for use in the second stage of the 
mapping method and hence each of the different overhang-adaptors needs to 
be physically distinguishable from all of the others. 

After the nucleic acid fragments are added to the first set of overhang- 
adaptors, the nucleic acid fragments are contacted with a nucleic acid ligase to 
cause selective ligation of the nucleic acid fragments with those overhang- 
adaptors of the first set whose 5 - or 3- single-stranded ends are fully 
complementary to the 5 - or 3 -overhanging single-stranded ends of the 
nucleic acid fragments. In this way, a plurality of separable or physically 
distinguishable populations of nucleic acid fragments are formed which are 
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ligated at their first ends to a first overhang-adaptor. 

Preferably, the overhang-adaptors (of both sets) are treated with 
phosphatase prior to use in order to reduce the occurrence of ligation between 
adjacent overhang-adaptors. 
5 Following the addition of nucleic acid ligase (which is preferably a 

DNA ligase), ligation is allowed to occur for an appropriate length of time for 
the single-stranded ends of the overhang-adaptors which are fully 
complementary to the overhanging single-stranded ends of the nucleic acid 
fragments to be ligated thereto. 
10 The ligation step may be replaced by any other process which 

selectively binds the single-stranded ends of the overhang-adaptors to the fully 
complementary overhanging ends of the nucleic acid fragments. 

In some embodiments of the invention, the reference to ligation and 
ligating the nucleic acid fragments may be replaced by a chemical ligation, 
15 such as that described in Nature Biotechnology, vol. 19, February 2001, 
pp 148-1 52, Xu et al. 

Thus upon contacting the nucleic acid fragments with the first set of 
overhang-adaptors, the complementary ends of these two groups of molecules 
are allowed to hybridise and be ligated to one another. For example, if the 

2 0 target nucleic acid molecule is cut with Type lis restriction endonuclease Fok 

I, 4-nucleotide 5 -overhanging ends will be produced in the nucleic acid 
fragments (assuming that at least one Fok I site is present in the target DNA). 
This might, for example, produce a S'-overhanging end having the sequence 
5 , -GATC-3\ This overhanging end would then selectively hybridise to the 
25 overhang-adaptor with the 5-end sequence of 5 -GATC-3 1 . Upon the 

addition of DNA ligase, the adjacent 3'-end of the nucleic acid fragment 
would then be ligated to the 5-end of the overhang-adaptor. 

The overhang-adaptors may either be attached to or carrying a means 
for attaching to a solid support. 

3 0 In one preferred embodiment of the invention, overhang-adaptors are 

fixed to solid supports. This may be achieved in a number of different ways. 
The overhang-adaptors may be attached to one or more moieties which allow 
binding of that overhang-adaptor to a solid support, for example the end (or 
several internal sites) may be provided with one partner of a binding pair, e.g. 
3 5 with biotin which can then be attached to a streptavidin-carrying solid 
support. 

Overhang-adaptors may be engineered to carry such a binding moiety 
in a number of known ways. For example, a PCR reaction may be conducted 
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to introduce the binding moiety, e.g. by using an appropriately-labelled 
primer. Alternatively, the overhang-adaptor may be ligated to a binding 
moiety, e.g. by cleaving the overhang-adaptor with a restriction enzyme and 
then ligating it to an adapter/linker whose end has been labelled with a 
5 binding moiety. Such a strategy would be particularly suitable if a Type lis 
restriction endonuclease is used that forms a non-palindromic overhang. 
Another alternative is to clone the overhang-adaptor into a vector which 
already carries a binding moiety, or that contains sequences that facilitate the 
introduction of such a moiety. 

10 Alternatively overhang-adaptors may be attached to solid supports 

without the need to attach a binding moiety insofar as the overhang-adaptor 
itself is one partner of the binding pair. Thus, for example short PNA 
. molecules that are attached to a solid support may be used. PNA molecules 
have the ability, to hybridize and bind to double-stranded DNA and overhang- 

15 adaptors can therefore be attached to a solid support with this strategy. 
Similarly, oligonucleotide probes may be used to bind complementary 
sequences to a solid support. 

Appropriate solid supports suitable as immobilizing moieties for ^ 
attaching the overhang-adaptors are well known in the art and widely 

2 0 described in the literature. Generally speaking, the solid support may be any 
of the well-known supports or matrices which are currently widely used or 
proposed for immobilization, separation, etc., in chemical or biochemical 
procedures. Thus for example, the immobilizing moieties may take the form 
of beads, particles, sheets, gels, wells, filters, membranes, microfibre strips, 

2 5 tubes or plates, fibres or capillaries, made for example of a polymeric material, 

e.g. agarose, cellulose, alginate, teflon, latex or polystyrene. Particulate 
materials, e.g. beads, are generally preferred. Conveniently, the immobilizing 
moiety may comprise magnetic particles, such as superparamagnetic particles. 
In a further preferred embodiment, plates or sheets are used to allow fixation 

3 0 of molecules in linear arrangement. The plates may also comprise walls 

perpendicular to the plate on which molecules may be attached. Attachment 
to the solid support may be performed directly or indirectly. For attaching 
the target molecules, conveniently attachment may be performed indirectiy by 
the use of an attachment moiety carried on the nucleic acid molecules and/or 
3 5 solid support. Thus for example, a pair of affinity binding partners may be 
used, such as avidin, streptavidin or biotin, DNA or DNA binding protein 
(e.g. either the lac I repressor protein or the lac operator sequence to which it 
binds), antibodies (which may be mono- or polyclonal), antibody fragments 



BNSDOCID: <WO 0181036A2_I_> 



WO 01/61036 



PCT/GB01/00718 



- 10 - 

or the epitopes or haptens of antibodies. In these cases, one partner of the 
binding pair is attached to (or is inherently part of) the solid support and the 
other partner is attached to (or is inherently pan of) the nucleic acid 
molecules. Other techniques of direct attachment may be used such as for 
5 example if a filter is used, attachment may be performed by UV-induced 
crosslinking. When attaching DNA fragments, the natural propensity of 
DNA to adhere to glass may also be used. 

Attachment of appropriate functional groups to the solid support may 
be performed by methods well known in the art, which include for example, 

10 attachment through hydroxyl, carboxyl, aldehyde or amino groups which may 
be provided by treating the solid support to provide suitable surface coatings. 
Attachment of appropriate functional groups to the nucleic acid molecules of 
the invention may be performed by ligation or introduced during synthesis or 
amplification, for example using primers carrying an appropriate moiety, such 

15 as biotin or a particular sequence for capture. 

Attachment to a solid support may be performed before or after 
overhang-adaptors have been produced. For example, overhang-adaptors 
carrying binding moieties may be attached to a solid support and thereafter 
treated with DNAse I or similar. Alternatively cleavage may be effected and 

2 0 then the fragments may be attached to the support. 

Thus one strategy which may be used is to fix polynucleotides that 
complement the overhang-adaptors that are to be isolated to a solid support 
(the inside of a well, mono-dispersed spheres, microarrays, etc.). 

Most preferably, the ligation reaction is carried out in free solution, i.e. 
25 where the overhang-adaptors are not attached to a solid support. The 
efficiency of the ligation reaction may be improved in this way. In such 
circumstances, the overhang-adaptors carry a means for attaching to a solid 
support, for example, biotin. Optionally, after the ligation reaction, the 
overhang-adaptors are bound to a solid support. 

3 0 An alternative is to fix the overhang-adaptors to a solid support such as 

paramagnetic beads or similar. 

A washing step is preferably carried out following the ligation of the 
nucleic acid fragments and the first set of overhang-adaptors in order to 
remove unligated nucleic acid fragments. The overhang-adaptors will 
35 generally be immobilised or bound to a solid support during the washing step. 

It should be pointed out that the specificity of the method can be 
adjusted to most purposes by repeating steps (b) and (c) one or several times, 
with a washing step in between if desired. It may also be appropriate to use 
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competing probes/overhangs during step (b) in order to increase specificity. 

At the end of step (c), a plurality of spatially separable or separate 
populations of nucleic acid fragments is formed which are ligated at their first 
ends to a first overhang-adaptor. Since the spatial position (i.e. the address) 
5 and the single-stranded end sequence of each of the first overhang-adaptors 
will be known, this will provide information on the sequences of the first 
overhanging ends of the nucleic acid fragments. Thus the sequence of the first 
overhanging end of each of the nucleic acid fragments is informationally 
linked to its spatial position or address. 
10 It should be noted that the first overhanging ends of the nucleic acid 

molecules will at this point have been inactivated through ligation to the first 
overhang-adaptors, i.e. the first overhanging ends of the nucleic acid 
fragments will no longer be capable of binding to further overhang-adaptors. 
The second overhanging ends of the nucleic acid fragments will essentially 
15 still be unbound. 

The ligation (and subsequent washing step, if required) marks the end 
of the first stage of the mapping method. 

The sequences of the second overhanging single-stranded ends of the 
nucleic acid fragments are then identified. This may be done by a number of 
20 different ways: 

Second stage - Method 1. Preferably, step (d) is carried out by: r 

(dl) optionally releasing each population of ligated nucleic acid fragments 
from the solid support, 

25 

selectively contacting each population of nucleic acid fragments which 
were ligated at their first ends to a first overhang-adaptor with a second 
set of overhang-adaptors, 

3 0 each overhang-adaptor of the second set comprising a nucleic acid 

molecule comprising at least one 5 - or 3-single-stranded end, 



the single-stranded ends of the overhang-adaptors of the second set 
being of lengths and orientations (i.e. 5 1 - or 3 1 -) corresponding to the 
3 5 lengths and orientations of the overhanging single-strands of the 

cleavage sites of the said restriction endonucleases, 

wherein said second set comprises a collection of overhang-adaptors 
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whose single-stranded ends collectively encode up to all possible 
permutations and combinations of the nucleotides A, C, G and T, 

and wherein each overhang-adaptor in the said second set is spatially 
5 distinguishable from every other different overhang-adiptor in the 

second set; 

(d2) contacting the nucleic acid fragments with a nucleic acid ligase to 
cause selective ligation of the nucleic acid fragments with those 
10 overhang-adaptors of the second set whose 5 - or 3 - single-stranded 

ends are fully complementary to the second 5 - or 3-overhanging ends 
of the nucleic acid fragments; 

thus forming a plurality of populations of nucleic acid fragments which 
15 are ligated at their second ends to a second overhang-adaptor, and 

optionally removing the non-ligated nucleic acid fragments; 

(d3) identifying the sequences of the first and second overhanging ends of 
20 each of the nucleic acid fragments from the spatial positions of the 

second overhang-adaptors to which the nucleic acid fragments are 
ligated. 

In a preferred aspect of the invention, steps (b)-(d2) are carried out 
25 simultaneously, i.e. the nucleic acid fragments are combined with the first and 
second sets of overhang adaptors simultaneously with the nucleic acid ligase. 

After the end of the first stage in the mapping procedure, the nucleic 
acid fragments are then prepared for contacting with the second set of 
overhang-adaptors. If the first overhang-adaptors were bound to a solid 
3 0 support, they are now released from that support, thus facilitating the transfer 
of the nucleic acid fragments to a different spatial position. The method of 
separation of the nucleic acid fragments from the solid support will be 
dependent on the way that the nucleic acid fragments were bound. One 
example of a method of releasing the nucleic acid fragments is through the use 
3 5 of a cleavage site located in the first overhang-adaptor. If the first overhang- 
adaptors are DNA molecules, then a restriction endonuclease that recognises 
a site in the (non-variable end of the) first overhang-adaptor may be used. 
Restriction endonucleases that produce overhanging ends having a length and 
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orientation which correspond to any of the second ends of the nucleic acid 
fragments should be avoided. Provided that the latter issue is taken into 
consideration, the nucleic acid fragments may be released through cleavage 
within the nucleic acid fragment itself. 
5 Each individual population of nucleic acid fragments which were 

ligated at their first ends to a first overhang-adaptor is then selectively 
contacted with a second set of overhang-adaptors, i.e. each nucleic acid 
fragment population is independently contacted with a second set of 
overhang-adaptors. Thus for example, the population of nucleic acid 

10 fragments which were bound to first overhang-adaptors having the first end 
sequence AAAC will be contacted independently with the second set of 
overhang-adaptors compared to the population of nucleic acid fragments 
which were bound to first overhang-adaptors having the first end sequence 
AAAT. In this way, the positional information which was derived from the 

15 first stage of the mapping method is preserved. 

The second set of overhang-adaptors are similar in many ways to those 
of the first step, particularly in the combinatorial nature of their single- 
stranded end sequences. Hence most of the comments given above regarding 
the first set of overhang-adaptors apply to the second set of overhang- 

2 0 adaptors, mutatis mutandis. 

Thus each overhang-adaptor of the second set comprises a nucleic acid 
molecule comprising at least one 5 - or 3 - single-stranded end. 

In the same manner as the first set of overhang-adaptors, the 5 - and/or 
3-single-stranded ends of the overhang-adaptors of the second set have 
25 lengths and orientations (i.e. 5- or 3'-) corresponding to the lengths and 
orientations of the overhanging single-strands of the cleavage sites of the 
chosen restriction endonucleases, wherein the second set comprises a 
collection of overhang-adaptors whose single-stranded ends collectively 
encode up to all possible permutations and combinations of the nucleotides 

3 0 A, C, G and T. 

Thus within the second set of overhang-adaptors, there will be 
individual overhang-adaptors that are capable of hybridising and ligating to 
each of the individual second ends of the nucleic acid fragments. 

Each overhang-adaptor in the said second set is spatially separable or 
3 5 spatially identifiable from every other different overhang-adaptor in the 
second set. Thus the position of each different overhang-adaptor will be 
known and this positional information can be used to determine the sequence 
of the first and second overhanging ends of any nucleic acid fragment which is 
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bound thereto. 

The second overhang-adaptors are preferably bound to a solid support, 
such as those described above. Most preferably, the solid support is a 
microarray, ideally one which can be automatically read, for example by a 
5 scanner. 

After the populations of nucleic acid fragments are contacted with the 
second set of overhang-adaptors, they are then contacted with a nucleic acid 
ligase to cause selective ligation of the nucleic acid fragments with those 
overhang-adaptors of the second set whose 5 - or 3 - single-stranded ends are 

l o fully complementary to the second 5 - or 3-overhanging ends of the nucleic 
acid fragments. The nucleic acid ligase is preferably a DNA ligase. 

Preferably, the overhang-adaptors of the second set are treated with 
phosphatase prior to use in order to reduce the occurrence of ligation between 
adjacent overhang-adaptors. 

1 5 Following the addition of nucleic acid ligase, ligation is allowed to 

occur for an appropriate length of time for the single-stranded ends of the 
second overhang-adaptors which are fully complementary to the second 
overhanging ends of the nucleic acid fragments to be ligated thereto. In this 
way, a plurality of populations of nucleic acid fragments which are ligated at 

20 their second ends to a second overhang-adaptor are formed. 

The spatial positions of each of the second overhang-adaptors are 
positionally correlated with the sequences of the first and second ends of the 
nucleic acid fragments. Consequently, the identification of which of the 
second overhang-adaptors have nucleic acid fragments ligated thereto will 

2 5 provide information on sequences of the ends of all of the nucleic acid 

fragments, thus facilitating the mapping of the target nucleic acid molecule. 

The following method may be used to determine which nucleic acid 
fragments have bound to the second overhang-adaptors. 

The invention therefore also provides a method suitable for detecting 

3 0 overhangs on a microarray address, the method comprising the steps of: 



providing one or more single-stranded nucleic acid adaptors each 
comprising a first part and a second part, the first and second parts 
being contiguous with one another, the first part having a free 5 - or 3 1 - 
3 5 end; 



wherein the adaptor is preferably bound to a solid support; 
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contacting the adaptor with a target nucleic acid molecule having a 
single-stranded overhang which is complementary with the first part of 
the adaptor; 

5 ligating the first pan of the adaptor to the single-stranded overhang of 

the target nucleic acid molecule; 

contacting the second part of the adaptor with one or more labelled 
single-stranded nucleic acid probes having a nucleotide sequence 

1 o which is complementary with the second part of the adaptor; 

ligating the labelled single-stranded nucleic acid probe to the target 
nucleic acid molecule; 

15 optionally removing any unligated labelled single-stranded nucleic acid 

probe and/or unligated nucleic acid molecule; 

determining whether any target nucleic acid molecule has been ligated 
to the first part of the adaptor by determining whether any labelled 

2 o probe is bound to the second part of the adaptor. 

It will be appreciated that the "one or more single-stranded nucleic 
acid adaptors" may comprise a first set of adaptors such as those defined 
above. Thus in one embodiment, the adaptors form a set of adaptors, the 
25 first parts of which are of lengths and orientations which correspond to the 

lengths and orientations of the overhanging single-stranded ends of the target 
nucleic acid molecules, wherein said set comprises a collection of adaptors 
whose first parts collectively encode up to all possible permutations and 
combinations of the nucleotides A, C, G and T, and wherein each adaptor in 

3 0 the said set is preferably spatially separable from every other different 

overhang-adaptor in the set. The comments given above with regard to first 
set of adaptors apply herein, mutatis mutandis. 

The solid support will preferably be an array or microarray. 

The labelled single stranded nucleic acid probes have nucleotide 
35 sequences which are complementary with all or essentially all of the second 
part of the adaptor, such that they are capable of hybridising to the adaptor 
and ligating with a target nucleic acid molecule when such a target nucleic 
acid molecule is bound by the first part of the adaptor. 
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The ligation steps may be carried out either sequentially or 
simultaneously. In the latter case, the target nucleic acid molecule is 
contacted with the adaptor together with the probe, and ligase is then added. 

In most embodiments of the invention, the ligation steps will be carried 
5 out as described above, i.e. sequentially. This allows competing non-labelled 
probes to be used to reduce the background levels of the method. 

The nucleic acid adaptor is preferably DNA. Similarly, the probe is 
also preferably DNA. The ligase is preferably a DNA ligase. The probe is 
preferably labelled with a fluorescent moiety. 

10 It will be appreciated that for the probe to be ligatable to the target 

nucleic acid molecule, one end of the probe must be capable of hybridising to 
the adaptor at a position such that the ends of the target nucleic acid molecule 
and probe are contiguous. 

The following is an example of this method. In this example, an 

15 overhang of the 4 bases TGCA is to be registered. (The principle is the same 
for 3"- and 5 - overhangs). This example is illustrated in Figure 1. 

Oligonucleotides with the sequence GCGGATGCAGGACGT 
attached to a microarray are the basis for this example. The first (innermost) 
1 1 nucleotides are designed to complement a fluorescent probe, while the 4 

20 last (outermost) nucleotides will recognize the overhang of the target nucleic 
acid molecule. There is evidendy great freedom of choice regarding the length 
and arrangement of these two components as long as the probe complements 
the oligonucleotides, and the four outermost nucleotides complement the 
overhang to be registered at the address of interest. 

25 The target nucleic molecules are distributed over the microarray, 

together with the fluorescent probe and a nucleic acid ligase with a suitable 
reaction buffer. When incubating, the target nucleic acid overhang will ligate 
with the oligonucleotides, provided that the 4 outermost nucleotides are 
complementary. Thus the fluorescent probes will ligate with the target nucleic 

3 0 acid overhang. By observing if the address fluoresces after washing off 

unligated probes, one will be able to determine whether the overhang TGCA 
was present in the target nucleic acid molecule. 

In a variation of the above method, multiple overhangs may be 
registered using the same adaptor. The strategy described above can be 

3 5 extended further to make it possible to register multiple overhangs at the same 
address. For the above example, one can for instance add the following 
probes: 
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1) CGCCTACGTCCT 

2) CGCCTACGTCC 

3) CGCCTACGTC 

5 The three probes are marked with three different fluorophores - yellow, 

green, and red, respectively. If the address illuminates yellow when reading, 
one knows that the probe 1) has been ligated with the 3-nucleotide-long 
overhang GCA. Accordingly, green fluorescence will indicate that probe 2) 
has been ligated with the 4-nucleotide-long overhang TGCA, and red 

10 fluorescence indicates that probe 3) has been ligated with the 
5-nucleotide-long overhang CTGCA. 

Thus in some embodiments of the invention, the labelled probes 
comprise a set of labelled probes, having different lengths and different labels. 
. . In some instances, for example, if one wished to reduce the number of 

15 addresses required, it could be useful not to register all bases in an overhang. 
This may be accomplished by using an adaptor that contains one or more 
universal bases (U) or by using adaptors with two or more permutations at 
the same address. 

It should be noted that the strategy described above may also be used 

20 for sorting. The probe and the oligonucleotide may for example contain a 

cleavage site for a restriction endonuclease. To ensure that the target DNA 
molecules are released at the right time in the sorting procedure, the probe 
has to be attached to the oligonucleotide when the restriction endonuclease 
performs the cut. 

25 

Second Stage - Method 2. A further approach which may be used to identify 
the nucleic acid fragments which are ligated in the first stage is to use tags 
which are bound to the second overhang-adaptors. 

In this embodiment, the second stage comprises the steps of: 

30 

(dl) optionally releasing each population of ligated nucleic acid fragments 
from the solid support, 

selectively contacting each population of nucleic acid fragments which 
3 5 are or were ligated at their first ends to a first overhang-adaptor with a 

second set of overhang-adaptors, 

each overhang-adaptor of the second set comprising a nucleic acid 
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molecule comprising at least one 5 - or 3-single-stranded end, 

the single-stranded ends of the overhang-adaptors of the second set 
being of lengths and orientations (i.e. 5 - or 3'-) corresponding to the 
5 lengths and orientations of the overhanging single-strands of the 

cleavage sites of the said restriction endonucleases, 

wherein said second set comprises a collection of overhang-adaptors 
whose single-stranded ends collectively encode up to all possible 
10 permutations and combinations of the nucleotides A, C, G and T, 

and wherein each different overhang-adaptor in the second set is 
bound to an individual tag; 

(d2) contacting the nucleic acid fragments with a nucleic acid ligase to 
cause selective ligation of the nucleic acid fragments with those 
overhang-adaptors of the second set whose 5 - or 3 - single-stranded 
ends are fully complementary to the second 5'- or 3-overhanging ends 
of the nucleic acid fragments; 

thus forming a plurality of populations of nucleic acid fragments which 
are ligated at their second ends to a tagged second overhang-adaptor, 
and 

25 optionally removing the unligated nucleic acid fragments; 

(d3) identifying the sequences of the first and second overhanging ends of 
each of the nucleic acid fragments from the tags which are bound to 
the second overhang-adaptors. 

The comments given above with regard to the production and use of a 
second set of overhang-adaptors apply, mutatis mutandis, to this 
embodiment. 

In a preferred aspect of the invention, steps (b)-(d2) are carried out 
simultaneously, i.e. the nucleic acid fragments are combined with the first and 
second sets of overhang adaptors simultaneously with the nucleic acid ligase. 
This provides three different possibilities: 1) Nucleic acid fragments that have 
been ligated with the first adaptor at one end and the second adaptor at the 
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other end; 2) Nucleic acid fragments with the first adaptor on both ends; and 
3) Nucleic acid fragments with the second adaptors on both ends. It is only 
the fragments in the first group that will result in successfull signals. 
Fragments from the other groups will not produce problems, however, since 
5 they will either give rise to no signal or be removed during washing. 

With regard to this embodiment of the invention, the nucleic acid 
fragments may be bound or capable of being bound either via the first or the 
second overhang-adaptors. 

The term "tag" is used in this context to refer to a structure or 
1 0 molecule which is capable of representing the sequence information of a pair 
of first and second overhanging ends of any one nucleic acid fragment; and 
which is distinguishable from all of the other individual tags. 

The tag may be a specific DNA sequence, e.g. 50-500bp long that can 
be amplified and then used as a probe that is hybridised to a microariay. 
15 Alternatively, the tags may be DNA sequences of different lengths that 

can be separated, for example on a gel. In this case, the first tags may be 
amplified (for example by PCR) or released from the solid substrate. It will 
also normally require that one gel separation is run per well By performing 
one gel separation per well, this avoids the need for letting the tags represent 
20 both overhangs. 

Another system for the identification of the tagged second overhang- 
adaptors is to have tags comprising a group of hybridisation sequences to 
which a plurality of labelled probes may selectively be hybridised, each group 
of hybridisation sequences being representative of the sequence of the second 
25 overhanging end of the nucleic acid fragment, and each labelled probe being 
representative of one or more of the nucleotides present in that second 
overhanging end of the nucleic acid fragment. 

Using this system, the group of hybridisation sequences may be read in 
a number of cycles, in most cases, the number of cycles corresponding 
3 0 essentially to the number of overhanging nucleotides n in the second 
overhanging end of the nucleic acid fragments. 
This stage therefore comprises steps of: 

(dl) optionally releasing each population of ligated nucleic acid fragments 
3 5 from the solid support, 

selectively contacting each population of nucleic acid fragments which 
are or were ligated at their first ends to a first overhang-adaptor with a 
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second set of overhang-adaptors, 

each overhang-adaptor of the second set comprising a nucleic acid 
molecule comprising at least one 5 - or 3-single-stranded end, 

5 

the single-stranded ends of the overhang-adaptors of the second set 
being of lengths and orientations (i.e. 5 - or 3-) corresponding to the 
lengths and orientations of the overhanging single-strands of the 
cleavage sites of the said restriction endonucleases, 

10 

wherein said second set comprises a collection of overhang-adaptors 
whose single-stranded ends collectively encode up to all possible 
permutations and combinations of the nucleotides A, C, G and T, 

15 wherein each different overhang-adaptor in the second set is bound to 

an individual tag; 

wherein the tag comprises a plurality of hybridisation sequences, each 
hybridisation sequence being representative of one or more of the 

2 o nucleotides in the second overhanging end of the nucleic acid 

fragment; 

(d2) contacting the nucleic acid fragments with a nucleic acid ligase to 
cause selective ligation of the nucleic acid fragments with those 
25 overhang-adaptors of the second set whose 5- or 3 - single-stranded 

ends are fully complementary to the second 5 - or 3-overhanging ends 
of the nucleic acid fragments; 

thus forming a plurality of populations of nucleic acid fragments which 

3 o are ligated at their second ends to a tagged second overhang-adaptor, 

and 

optionally, removing the unligated nucleic acid fragments; 

3 5 (d3) contacting the tagged populations of nucleic acid fragments with a set 
of labelled probes, each set of labelled probes comprising at least one 
probe which is capable of binding specifically to at least one of the 
hybridisation sequences; 
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(d4) identifying which labelled probe has bound to the hybridisation 
sequence and identifying the spatial position of the bound probes 

(d5) removing the labelled probe from the hybridisation sequence; and 

5 

(d6) repeating steps (d3)-(d4), and optionally (d5), until the sequence of 
the overhang of the second end of the nucleic acid fragment has been 
determined; 

10 (e) comparing the sequences of the ends of the nucleic acid fragments in 
order to produce a map of the target nucleic acid molecule. 

With regard to this embodiment of the invention, the nucleic acid 
fragments may be bound or capable of being bound either via the first or the 
15 second overhang-adaptors. 

The hybridisation sequences may comprise a plurality of discrete or 
overlapping sequences to which separate labelled probes may be hybridised. 
Most preferably, the hybridisation sequences are single-stranded DNA 
sequences. 

20 The labelling of the probes may be by any means which is sufficient to 

distinguish the labelled probes from each other. Examples of labels include 
fluorescent moieties. 

In most cases, this system will require one labelled probe per 
overhanging end nucleotide. In some circumstances, however, probes may be 

2 5 used which represent combinations of two or more nucleotides. For example, 

probes may represent A, C, G or T; or the probes may represent AA, AC, 
AG, AT, CA 3 etc.. In the latter case, less cycles will be required, although a 
larger number (e.g. 16 in this case) of different distinguishable labels will be 
required. 

3 0 It is also possible to use a binary system where each nucleotide is 

represented by two probes. 

An example of the above method is given below. In this example, a 
four nucleotide overhang in the overhang-adaptor may, for example, be read 
with the tag illustrated below: 

35 
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Overhang- Hybridisation sequence 
adaptor 

CGAT:::::: : Sequence 1C : Sequence 2G : Sequence 3A : Sequence 4T 

5 

The tag that recognises the overhang GCTA contains a 
complementary overhang shown to the left and four hybridisation sequences 
shown to the right. After the overhang-adapter has been ligated to the 
overhang of the second end of the nucleic acid fragment and attached to the 
10 substrate four hybridisation cycles are performed: 

Fim cycle: 

:::GGTA:::::: * Green Probe 
15 :::CGAT:::::: : Sequence 1C : Sequence 2G : Sequence 3A : Sequence 4T 

The labelled green probe binds to the hybridisation sequence that is 
representative of a C at the first position. Labelled probes which are 
representative of A, G or T at the first position will not bind. 

20 

Secpnd cycle; 

:::GGTA:::::: * Red Probe 

:::CGAT:::::: : Sequence 1C : Sequence 2G : Sequence 3 A : Sequence 4T 

25 

The labelled red probe binds to the hybridisation sequence that is 
representative of a G at the second position. Labelled probes which are 
representative of A, C or T at the second position will not bind. 

30 Third cycle; 

:::CGTA:::::: * Yellow Probe 

:::CGAT:::::: : Sequence 1C : Sequence 2G : Sequence 3A : Sequence 4T 

3 5 The labelled yellow probe binds to the hybridisation sequence that is 

representative of an A at the third position. Labelled probes which are 
representative of C, G or T at the third position will not bind. 
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Fourth cvclc: 

:::CGTA:::::: * Blue Probe 

:::CGAT:::::: : Sequence 1C : Sequence 2G : Sequence 3A : Sequence 4T 

5 

The labelled blue probe binds to the hybridisation sequence that is 
representative of a T at the fourth position. Labelled probes which are 
representative of A, C or G at the fourth position will not bind. 

For each position, there exist four different sequence alternatives 

10 representing each of the four nucleotides that can be in the position. The 
sequence alternatives representing a given nucleotide, such as A, differ 
between the different positions allowing each position to be analysed 
independently. In this example, the overhang contains a C in position 1 and 
hence labelled hybridisation sequence C is used. After adding the four 

15 candidate probes that can be hybridised to position 1, a green probe 

representing C is attached. The probe is washed away after reading and four 
new candidate probes that can be attached to position 2 is added to the 
solution, and so on. 

The following embodiment provides a method for making maps with 

20 multiple restriction endonucleases by performing parallel mapping reactions. 

If it is desired to make maps of nucleic acid molecules containing 
multiple restriction endonuclease sites, it is possible to perform a mapping 
reaction with enzymes that generate different overhang lengths and qualities 
(for example Earl, Bbvl and Bael). It will, however, be impossible to 

25 distinguish between restriction endonucleases that produce the same kind of 
overhang, for example Bbvl and Alw26I (both of which produce 4-nucleotide 
5-overhangs), if they are used in the same mapping reaction. 

The invention therefore provides a method where several mapping 
reactions are carried out in parallel and, by combining the information from 

3 0 them all, it is possible to generate a consensus map where it is possible to 

distinguish between restriction endonucleases that produce the same kind of 
overhang. 

The invention therefore provides a method for mapping a nucleic acid 
molecule comprising the steps of: 

35 

(A) treating the nucleic acid molecule with a first set of Type lis restriction 
endonucleases to produce one or more nucleic acid fragments, each of 
the restriction endonucleases in the first set producing different 



BNSDOCID: <WO 0161036A2J_> 



WO 01/61036 



PCT/GB01/00718 



- 24 - 

overhanging single-stranded ends to the other restriction 
endonucleases in the first set. 



and determining the sequences of the overhanging ends of the nucleic 
5 acid fragments produced thereby; 

(B) treating the nucleic acid molecule with a second set of Type lis 
restriction endonucleases to produce one or more nucleic acid 
fragments, the second set comprising at least one Type lis restriction 

l o endonuclease which was not used in step (A) but which has a cleavage 

site which is the same as one or more of the Type lis restriction 
endonucleases used in step (A); 

and determining the sequences of the overhanging ends of the nucleic 
1 5 acid fragments produced thereby; 

(C) optionally treating the nucleic acid molecule with one or more further 
sets of Type lis restriction endonucleases to produce one or more 
nucleic acid fragments, 

20 

and determining the sequences of the overhanging ends of the nucleic 
acid fragments produced thereby; 

(D) treating the nucleic acid molecule simultaneously with the Type lis 
25 restriction endonucleases from all of the sets to produce one or more 

nucleic acid fragments, 

and determining the sequences of the overhanging ends of the nucleic 
acid fragments produced thereby; 

30 

(E) producing a map of the nucleic acid molecule by vising the information 
derived from steps (A)-(D). 



The nucleic acid molecule in this embodiment is preferably a double- 
35 stranded DNA molecule. In some embodiments of this invention, step (C) is 
omitted. In other embodiments, the number of restriction endonucleases 
used in each set is independently 2, 3 or 4. Thus for example, step (A) may 
be carried out with 3 restriction endonucleases, step (B) may be carried out 
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with 3 restriction endonucleases and step (D) may be carried out with 6 
restriction endonucleases. 

The determining of the sequences of the overhanging ends of the 
nucleic acid fragments produced in steps (A)-(D) is preferably carried out 
using a method disclosed herein. 

An example of this method is given below: 





5% 3 nucleotide 
overhang 


5', 4 nucleotide 
overhang 


3', 5 nucleotide 
overhang 


Mapping 
reaction 1 


Earl 


Bbvl 


Bael 


Mapping 
reaction 2 


Sapl 


Alw26I 


Bp 11 


Mapping 
reaction 3 


Earl + SapI 


Bbvl + Alw26I 


Bael + Bp 11 



Map 1: 

Earl 
(GCT) 



Bael 
(TCTTT) 



Earl 
(TTT) 



Bbvl 
(GCCT) 



Map 2: 

Alw26I 
(GTGC) 



Alw26I 
(TACA) 



Sapl 
(TGT) 



Bp II 

(AAAGA) 



Bbvl/ Bbvl/ Bael/ 
Alw26I Alw26I Bp II 



Earl/ Earl/ Bael/ Bbvl/ 
Sapl Sapl Bp II Alw26I 



Map 3: 

Earl/ 
Sapl 

(GCT) (GTGC) (TACA) (TCTTT) (TTT) (TGT) (AAAGA) (GC CT) 
Consensus map: 

Earl Bael Earl Bbvl 

Alw26I Alw26I Sapl Bp II 

(GCT) (GTGC) (TACA) (TCTTT) (TTT) (TGT) (AAAGA) (GCCT) 
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This strategy can of course be expanded further with additional 
restriction endonucleases and mapping reactions. It is therefore possible to 
make detailed maps with 10-20 different Ip and lis restriction endonucleases. 
In addition to that, it is also possible to place as much as 10-40 different 
5 ordinary Type II enzymes (such as EcoRI and Hindlll) into the map as soon 
as the framework with Type Ip and lis endonucleases is established. 

A further embodiment of the invention provides a method of 
mapping a target nucleic acid molecule, the method comprising the steps of: 

10 (a) treating the target nucleic acid molecule with one or more 

restriction endonucleases to produce one or more nucleic acid 
fragments having first and second 5 - or 3 - single-stranded 
overhanging ends, 

15 (b) adding the nucleic acid fragments to a first set of overhang- 

adaptors, 

each overhang-adaptor of the first set comprising a nucleic acid 
molecule comprising at least one 5 - or 3-single-stranded end, 

20 

the single-stranded ends of the first overhang-adaptors being of 
lengths and orientations (i.e. 5 - or 3-) corresponding to the lengths 
and orientations of the overhanging single-strands of the cleavage 
sites of the said restriction endonucleases, 

25 

wherein said first set comprises a collection of overhang-adaptors 
whose single-stranded ends collectively encode up to all possible 
permutations and combinations of the nucleotides A, C, G and T 
at all positions in the single-stranded ends except one or more 
3 0 positions, the latter positions being taken by universal nucleotides, 

and wherein each overhang-adaptor in the said first set is spatially 
separable from every other different overhang-adaptor in the first 
set; 

35 

(cl) contacting the said nucleic acid fragments with a nucleic acid ligase 
to cause selective ligation of the nucleic acid fragments with those 
overhang-adaptors whose 5'- or 3 - single-stranded ends are 



BNSDOCID: <WO 0161036A2_I_> 



WO 01/61036 



PCT/GB01/00718 



- 27 - 

complementary to the 5 - or 3-overhanging single-stranded ends of 
the nucleic acid fragments, 

thus forming a plurality of separable populations of nucleic acid 
5 fragments which are ligated at their first ends to a first overhang- 

adaptor, and then 

optionally, removing any nucleic acid fragments which are not 
ligated to first overhang-adaptors; 

releasing the nucleic acid fragments which are bound at their first 
ends with a restriction endonuclease which creates a new first 
overhanging single-stranded end in the nucleic acid fragment which 
comprises the nucleotide or nucleotides in the nucleic acid 
fragments which corresponded to the universal nucleotides; 

selectively contacting each released population of nucleic acid 
fragments with a second set of overhang-adaptors, 

20 each overhang-adaptor of the second set comprising a nucleic acid 

molecule comprising at least one 5 - or 3-single-stranded end, 

the single-stranded ends of the overhang-adaptors of the second set 
being of lengths and orientations (i.e. 5 - or 3'-) corresponding to 
2 5 the lengths and orientations of the overhanging single-strands of the 

cleavage sites of the said restriction endonuclease, 

wherein said second set comprises a collection of overhang- 
adaptors whose single-stranded ends collectively encode up to all 
possible permutations and combinations of the nucleotides A, C, G 
and T, 

and wherein each overhang-adaptor in the said second set is 
spatially distinguishable from every other different overhang- 
adaptor in the second set; 

(d2) contacting the nucleic acid fragments with a nucleic acid ligase to 
cause selective ligation of the nucleic acid fragments with those 



10 

(c2) 



15 

(dl) 



30 



35 
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overhang-adaptors of the second set whose 5 - or 3 - single-stranded 
ends are fully complementary to the second 5 - or 3-overhanging 
ends of the nucleic acid fragments; 

thus forming a plurality of populations of nucleic acid fragments 
which are ligated at their second ends to a second overhang- 
adaptor, and then 

optionally, removing any unbound nucleic acid fragments ; 



(d3) contacting the ligated nucleic acid fragments with labelled-adaptors 
which bind selectively to the new first overhanging end on the basis 
of the nucleotide or nucleotides in the new first overhanging end of 
the nucleic acid fragments which corresponded to the universal 
15 nucleotides; 

(d4) identifying the sequences of the first and second overhanging ends 
of each of the nucleic acid fragments from the spatial positions of 
the second overhang-adaptors to which the nucleic acid fragments 
20 are ligated, and from the labels which are attached to the first ends 

of the nucleic acid fragments; and 

(e) comparing the sequences of the ends of the nucleic acid fragments 

in order to produce a map of the target nucleic acid molecule. 

25 

This method has the advantage that an initial sorting is carried out 
using a smaller number of first overhang-adaptors. The missing sequence 
information is retrieved by making us of labelled-adaptors in the second stage. 
This embodiment is illustrated in Figures 2 and 3. 

3 0 In this embodiment, the first set comprises a collection of 

overhang-adaptors whose single-stranded ends collectively encode up to all 
possible permutations and combinations of the nucleotides A, C, G and T at 
all positions in the single-stranded ends except one or more positions, the 
latter positions being taken by universal nucleotides. In this context, 

3 5 "universal nucleotides" are nucleotides which are capable of base-pairing with 
any of the nucleotides A, C, G and T. 

Preferably, universal nucleotides are present at one or two positions 
in the single-stranded ends of the first overhang-adaptors. 
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The ligated nucleic acid fragments are released from the first 
overhang-adaptors by cleavage with a restriction endonuclease which creates a 
new first overhanging single-stranded end in the nucleic acid fragment. It 
should be noted that the new first overhanging end in the nucleic acid 
5 fragment may be the same length and orientation as the initial first 
overhanging end. Preferably this is done with a Type Ip or Type lis 
restriction endonunclease, for example, one having a recognition site in the 
first overhang-adaptor. The new first overhanging end comprises the 
nucleotide or nucleotides in the nucleic acid fragments which corresponded to 

10 the universal nucleotides, i.e. those nucleotides which hybridised opposite or 
base-paired with the universal nucleotides. 

The comments given above with regard to overhang-adaptors of the 
first and second sets apply herein, mutatis mutandis. 

The binding of the second overhanging ends of the nucleic acid 

1 5 fragments to the second overhang-adaptors will provide the majority of the 
sequence information on the first overhanging end of the nucleic acid 
fragment and all of the sequence information on the second overhanging end. 
The remaining information on the sequence of the first overhanging end is 
obtained through the use of labelled-adaptors. 

20 Labelled-adaptors are used which bind selectively to the new first 

overhanging ends of the nucleic acid fragments on the basis of the nucleotide 
or nucleotides in the new first overhanging ends of the nucleic acid fragments 
which corresponded to the universal nucleotides in the first overhang- 
adaptors. Thus the labelled adaptors will provide the information on the 

2 5 sequence of the first overhanging ends which was not provided by the binding 

of the nucleic acid fragment to the first overhang-adaptor. 

Preferably, the set of labelled-adaptors comprises a set of four 
labelled adaptors, wherein the labels are distinguishable from one another. 
Preferably,the labels are fluorescent moieties. 

3 0 In the context of this invention, labels are those which directly or 

indirectly allow detection and/or determination by the generation of a signal. 
Such labels include for example radiolabels, chemical labels (e.g. EtBr, 
TOTO, YOYO and other dyes), chromophores or fluorophores (e.g. dyes 
such as fluorescein and rhodamine), or reagents of high electron density such 
35 as ferritin, haemocyanin or colloidal gold. Alternatively, the label may be an 
enzyme, for example peroxidase or alkaline phosphatase, wherein the 
presence of the enzyme is visualized by its interaction with a suitable entity, 
for example a substrate. The label may also form part of a signalling pair 
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wherein the other member of the pair may be introduced into close proximity, 
for example, a fluorescent compound and a quench fluorescent substrate may 
be used. 

A label may also be provided on a different entity, such as an 
5 antibody, which specifically recognizes at least a region of molecule to be 
identified. If the molecule to be identified is a polynucleotide, one way in 
which a label may be introduced for example is to bind a suitable binding 
partner carrying a label, e.g. fluorescent labelled probes or DNA-binding 
proteins. 

10 Once the sequence information has been accumulated, a computer 

program may then be used to assemble the sequence pieces into the final 
sequence. 

Kits for performing the mapping methods described herein form a 
further aspect of the invention. Thus viewed from a further aspect, the 

15 present invention provides a kit for mapping a target nucleic acid molecule 
comprising a set of first overhang-adaptors as described herein, optionally 
attached to one or more solid supports; a set of second overhang-adaptors as 
described herein; and one or more restriction endonucleases for use with one 
or more of the methods described herein. 

20 Optionally the kit may contain other appropriate components 

selected from the list including vectors into which the target molecules may be 
ligated, ligases, enzymes necessary for inactivation and activation of restriction 
or ligation sites, primers for amplification and/or appropriate enzymes, buffers 
and solutions. Appropriate labelling means may also be included in such kits. 

25 The use of such kits for mapping target nucleic acid molecules form 

further aspects of the invention. 

To increase the statistical capacity of the method, it is possible to 
record restriction endonuclease cleavage sites located between the two 
overhanging ends of a nucleic acid fragment. One strategy is to free DNA 

3 0 molecules from the wells used in the first sorting step using restriction 

endonucleases which do not have binding sites in the overhang-adaptors. 
There must be cleavage sites in the actual target DNA for the DNA molecules 
to be freed and made available for the next step. This procedure may of 
course be repeated with several restriction endonucleases. It is also possible 

3 5 to cut with several endonucleases at once. If labelled adaptors are then used 
that recognise and label the different overhangs with different colours, it will 
be possible to record which enzyme has freed the nucleic acid fragment. 

Thus the invention relates to a method wherein after the nucleic 
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acid fragments are selectively ligated to the first overhang-adaptors, they are 
treated with a restriction endonuclease; and a labelled adaptor is then used to 
determine whether the restriction endonuclease has cut the nucleic acid 
fragment. The labelled adaptor may bind either to the cut end of the released 
5 nucleic acid fragment or to the cut end of the bound nucleic acid fragment. 

The presence or absence of a restriction endonuclease cleavage site 
in a target nucleic acid molecule may also be determined by immobilising the 
target nucleic acid molecule on a solid support, for example, a microarray, 
and to label the free end of the target nucleic acid, for example with a 

10 fluorescent moiety. The target nucleic acid molecule may then be treated with 
a restriction endonuclease. If the label disappears after the restriction 
endonuclease treatment, then it can be said that the restriction endonuclease 
cuts in the target nucleic acid. This method is illustrated in Figure 4. 

The invention therefore provides a method for determining the 

15 presence or absence of a restriction endonuclease cleavage site within a target 
nucleic acid comprising the steps of immobilising the target nucleic acid 
molecule on a solid support, labelling the free end of the target nucleic acid; 
treating the target nucleic acid with a restriction endonuclease; and then 
determining the presence or absence of the label after treatment. 

2 o This procedure may be repeated with several restriction 

endonucleases. Similarly, it is possible to cut with several restriction 
endonucleases at once and label the different overhangs with different 
colours. 

It is also possible to extend the last-mentioned principle to 
25 sequencing. In this method, one end of a target nucleic acid molecule is 

ligated with a linker containing a Type Ip or Type lis restriction endonuclease 
recognition site which creates an overhang in the actual target nucleic acid 
molecule of one or more bases. The overhang in the target nucleic acid 
molecule is then ligated with a labelled adaptor that recognises one or more 

3 0 overhanging bases. It is possible, for example, to use four different labelled 

adaptors that recognise adenine, cytosine, guanine and thymine, and which 
are labelled with four different fluorescent colours. The fluorescent colour of 
the address thus provides information on which base is in the position being 
analysed. 

35 If the labelled adaptors contain a cleavage site for a Type Ip or 

Type lis restriction endonuclease that generates a new overhang in the target 
sequence, and which has been displaced in relation to the first overhang, the 
process can be repeated one or more times, providing sequence information 
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in towards the centre of the target nucleic acid sequence in a controlled 
manner. 

There is provided therefore a method of sequencing a target nucleic 
acid molecule comprising the steps of: 
5 (i) ligating the target nucleic acid molecule with a linker nucleic acid, 

the linker nucleic acid comprising a recognition site for a Type Ip 
or Type lis restriction endonuclease which will cleave the target 
nucleic acid molecule; 

(ii) treating the target nucleic acid molecule with a Type Ip or Type lis 
X0 restriction endonuclease to produce one or more nucleic acid 

fragments having single-stranded overhanging ends; 

(iii) ligating one or more of the target nucleic acid fragments with a set 
of labelled adaptors which specifically recognise one or more of the 
nucleotides in the single-stranded overhanging ends of the nucleic 

15 acid fragments, wherein the labelled adaptors comprise a 

recognition site for a Type Ip or Type lis restriction endonuclease 
which will cleave the target nucleic acid molecule at a position one 
or more nucleotides 5 - or 3- to the first cleavage site; 

(iv) identifying which labelled adaptors have bound to the nucleic acid 
2 o fragments, thus providing information on the nucleotide sequence 

of at least part of the overhanging ends of the target nucleic acid 
fragment; 

(v) optionally, repeating steps (ii)-(iv) one or more times. 



25 Preferably, the nucleic acid molecule is a DNA molecule, most 

preferably a double-stranded DNA molecule. 

In a particularly preferred embodiment, the target DNA is 
immobilised at one end, for example on an array, and the linkers and labelled 
adaptors are bound to the other free end. 

3 o The above strategy may provide several important benefits over 

methods known in the prior art. Firstly, the use of microarrays means that the 
analyses are not based on signals from single molecules, but from a large set 
of equal-length target sequences. Stronger signals may therefore be obtained 
compared with scanning strategies that are based on single molecules. 

3 5 Furthermore, it is easier to carry out a lot of cycles as loss of target DNA can 
be tolerated. 

It should be noted that in all embodiments of the invention, the 
nucleic acid fragments may be amplified by a linear or exponential PCR using 
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the first and/or second overhang-adpators as PCR primers. 



LEGENDS TO FIGURES 

5 

Figure 1 Method for registering overhangs on a microarray address. 
Figure 2 Example of the use of multiple coloured fluorescence adaptors in 

order to reduce the number of addresses required. 
Figure 3 Example of the identification of internal cleavage sites, illustrating 
10 the presence of the doublet AAAT-CAGA. 

Figure 4 Example of how the fluorescent colour disappears from an address 
if the DNA fragment contains a cleavage site for the restriction 
endonucleases being used. 
Figure 5 Digestion of a target nucleic acid molecule with Type lis restriction 
15 endonucleases Fok I and Hga I. 

Figure 6 Example of the first stage of the mapping procedure using a 

restriction endonuclease which produces a 4 nucleotide single- 
stranded overhang. 
Figure 7 Example of an area of from a microarray, illustrating the presence 
20 of the doublet TTTA-GTCT 

The following examples are given by way of illustration only and 
should not be read as limiting the invention in any way. 

25 

EXAMPLES 

Example 1 - Mapping method 

3 0 The mapping principle is illustrated by the Type lis restriction 

endonucleases Hgal and Fokl as shown in Figure 1. 

In the first step, the target sequence was cut with Hgal and Fokl to 
form five fragments each with two unique overhanging ends. This included a 
Fokl site with an ACGT overhang furthest to the left. This was followed by 

3 5 an Hgal site, three Fokl sites and, finally, an Hgal site furthest to the right. 
The map produced contained the internal sequence of the two restriction 
endonucleases, together with details of the sequences and positions of the 
overhanging ends. 
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Example 2 - Scanning using microarrays 

The mapping procedure was carried out with Fokl, i.e. an enzyme 
that creates 4-nucleotide overhangs. With such an enzyme, 256 overhang 
5 permutations can be generated, which in turn means that there are 256 x 256 
permutations of overhanging end pairs. A microarray with 65,536 addresses 
was thus used to identity the overhanging end pairs present in a solution. 

Several strategies were envisaged for assigning the overhang pairs to 
the correct addresses in the microarray. In this case, ligations and a two-step 
10 sorting procedure as illustrated in Figure 6 were used. 

We started with a micro titre plate with 256 wells including 
overhang adaptors anchored to the wells 1 substrates. Well 1 contained 
adapters with AAAA overhangs 3 well 2 contained adapters with AAAC 
overhangs etc., so that each overhang permutation had its own well. The 
1 5 solution with the overhang pairs was then distributed evenly between the 

wells. Ligase was added so that the pairs with overhangs complementing the 
overhang-adapters in the respective wells were ligated (the overhang pairs 
were treated with phosphatase initially in order to reduce the occurrence of 
ligations between overhang pairs). Then, after washing the well, we were left 
20 with just two overhang pairs which had been ligated. These were then freed, 

by means of a cleavage site located in the overhang adaptors, so we could then 
proceed to the next sorting round. 

It should be noted that the overhangs that were ligated to the 
overhang adaptors had now been inactivated. Freed DNA molecules from 

2 5 well 1 were then added to area 1 on a microarray, DNA molecules from well 

2 to area 2, etc.. The 256 areas on the microarray were physically separated 
from each other. Furthermore, each area was divided into 256 addresses, 
address no. 1 comprising overhang adaptors with AAAA overhangs, address 
no. 2 has AAAC overhangs etc.. We then incubated the DNA solution with 

3 0 ligase, and the overhang pairs with TTTT overhangs ligated to address 1, and 

so on. 

After the overhang pairs were ligated to their respective addresses, 
the microarray could be scanned. The information was scanned as shown in 
Figure 7. 

3 5 We recorded a light signal at address 85, area 4, hence we knew 

that one overhang must be 111 A because all the DNA in this area was 
sorted into well no. 4 where overhang adaptors with AAAT overhangs were 
used. Similarly we could ascertain that the other overhang must have been 
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GTCT as the overhang adaptors at address 85 have CAGA overhangs. 

Example 3 - Mapping of pBluescript 

5 In this example, a combination of Type lis and Ip restriction 

endonucleases were used to map DNA sequences. 

In this procedure, the mapping of Hgal, Fokl and BstXI on 
pBluescript is used as an example. In the procedure, pBluescript is digested 
with the three enzymes which generates 9 fragments. A complete set of 

10 adapters is ligated to the overhangs (in this example, they are called left and 
right adapters for convenience). The left adapters will recognize the left 
overhangs and the right adapters, the right overhangs. Ligation is performed 
in 9 tubes where each tube contains a specific biotinylated left adapter 
corresponding to a specific overhang. By adding streptavidin-coated beads to 

15 the wells a sorting based on the left overhangs is performed. After extensive 
washing to remove unbound fragments, a linear PCR (or alternatively, 
exponential PCR) is performed on the right adapter which is ligated on to the 
other overhang. It should be noted that the sequence of the right adapter, 
except for a common primer site, is specific for the right overhang it 

20 recognizes. The adapter, and thus, the overhang sequence, can be determined 
by hybridization to its counterpart on a microarray. Based on the overhang 
quality, the order of restriction endonucleases can be mapped on pBluescript. 

fteagems; 
25 pBluescript SKH+ 
Hgal (NEB) 
Fokl (NEB) 
BstXI (NEB) 

Biotinylated left adapters (-30 bp) 
3 0 Non-biotinylated left adapters (-30 bp) 

Right adapters (-90 bp) 

T4 DNA ligase buffer (NEB) 

T4 DNA ligase (NEB) 

Taq polymerase buffer (Dynazyme) 
35 dNTPs 

Taq (Dynazyme) 

Polylysine-coated slides 

Cy3-labelled antisense right adapter oligo 
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Hybridization solution 
Streptavidin-coated M-270 beads 

Bind and wash buffer (B&W) (10 mM Tris-Cl, pH 7.5, 1 mM EDTA, 2M 
NaCl) 

5 

Protocol: 

Step I: Diges tion of pBluescript: 
10 First digestion: 



pBluescript: 18 mg 

NEB3 buffer: IX 

BstXI: 36U 

15 Volume: 50 ml 



Incubation at 55°C for 1 hr. Ethanol precipitation to change buffer. 
■Second digestion: 

20 

BstXI digested pBluescript: 18 mg 



NEB4: IX 

Hgal: 36U 

Fokl: 36U 

25 Volume: 270 ml 



Incubation at 37°C for 1 hr. Ethanol precipitation to concentrate the 
sample. Dissolvation of sample to 1 mg/ml by adding 18 ml of TE. This 
concentration equals 0.52 pmol/ml of pBluescript. 

30 

Step II: Ligation of adapters to pBluescript fragments; 

9 tubes containing the following: 

35 Tubei (where i=A-I) 

Digested pBluescript (from step I): 1 pmol (= 2 mg) 

Ligase buffer: IX 

All left adapters - adapteri: 10 pmol each 
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Biotinylated adapted: 10 pmol 

All right adapters: 10 pmol each 

T4 DNA ligase: 800 U 

Volume: 60 ml 

5 

Incubation at 20°C for 4 hrs. 
Step III: Immobilization 

1 o Mix each of the tubes from step II with: 

Equilibrated M-270 beads: 0.1 mg 

Volume (2X B&W) : 60 ml 

15 Incubation at 25°C for 1 hr with rotation (rotator). 

Three washes using 120 ml 2X B&W buffer. Additional wash with 120 ml IX 
PCR buffer. Beads dissolved in 10 ml IX PCR buffer. 

20 Step IV: PCR amplification of right adap ter using Cv3-primer 

Taq buffer: 0.8X 

MgCl 2 : 6mM 

dNTPs: 50 mM 

2 5 Cy3-primer: 1 00 pmol 

Template on beads in IX buffer: 10 ml 
(1 pmol immobilized fragment) 

Taq polymerase: 0.4U 

Volume: 50 ml 



30 



35 



Thermal cycling: 95°C, 2 min; 95°C, 15 sec, 58°C, 30 sec, 72°C, 15 sec; 30 
cycles 

Ethanol precipitation of PCR product to increase concentration. 

Step V: Hybridization of PCR-amplified probes to microarravs 

Each PCR-amplified probe is hybridized to a separate microarray. Each 
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microarray printed on poly-L-lysine coated slides in exact same way and 
includes 6 control spots (Cy3-labelled oligo) and 9 test spots. Each of the 9 
test spots contains oligo that is supposed to hybridize only to the PCR 
product from a single adaptor template (the list of oligos) 

5 

Hybridization is to be carried out according to the following protocol: 

1. Each PCR-amplified probe (containing Cy3-label) is to be dissolved in 1,7 
ml of 2X hybridization solution (7X SSC and 0.6% SDS) and added the 
10 following mix to get hybridization probes: 



50x Denhardt ! s reagent: 0.5 ml 

tRNA (4mg/ml): 0.5 ml 

Salmon sperm DNA (lOmg/ml): 0.5 ml 

15 2x hybridization solution: 3.3 ml 

Water: 3.5 ml 

Total hybridization volume 10.0 ml 



2. Poly-L-lysine coated slides with printed microarrays are to be 
20 pre-processed according to protocol from P. O. Brown's laboratory: 

a. DNA is to be cross-linked to the slides by irradiating with UV (60 mj) 

b. Slides are to be blocked in blocking solution (blocking solution contains 6 
gram of succinic anhydride dissolved in 335 ml of l-methyl-2 pyrrolidinone 

25 and supplemented with 15 ml of boric acid, pH=8.0) for 20 minutes with 
vigorous agitation, rinsed in distilled water, boiled in distilled water for 2 
minutes, and washed for 2 minutes in cold 96% ethanol. 

c. Right before the hybridization, hybridization probes are to be boiled for 2-5 
minutes 

3 0 d. Hybridization probes are to be applied to individual microarrays and 

hybridized for 12-16 hours under cover slip in humidified chamber(s) inside 
hybridization oven or in a water bath at 50 - 65°C 

3. After hybridization, slides with microarrays are to be removed from the 
35 humidified chamber(s) and washed as follows: 

a. Once with lxSSC, 0.05% SDS for 2-3 min 

b. Once with 0.2xSSC for 2-3 min 
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c. Once with 0.05xSSC for 2-3 minutes 

4. Slides are then to be dried by gende centrifugation (1 5 000 rpm, 5 min) 
5 5. Slides are then to be scanned with a laser appropriate for Cy3 label. 
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CLAIMS 

1 . A method of mapping a target nucleic acid molecule, the method 
comprising the steps of: 

(a) treating the target nucleic acid molecule with one or more 
restriction endonucleases to produce one or more nucleic acid 
fragments having first and second 5 - or 3 - single-stranded 
overhanging ends, 

(b) adding the nucleic acid fragments to a first set of overhang- 
adaptors, 

each overhang-adaptor of the first set comprising a nucieic acid 
15 molecule comprising at least one 5 f - or 3 -single-stranded end, 

the single-stranded ends of the overhang-adaptors being of lengths 
and orientations (i.e. 5 - or 3 '-) corresponding to the lengths and 
orientations of the overhanging single-strands of the cleavage sites 
20 of the said restriction endonucleases, 

wherein said first set comprises a collection of overhang-adaptors 
whose single-stranded ends collectively encode up to all possible 
permutations and combinations of the nucleotides A, C, G and T, 

25 

and wherein each overhang-adaptor in the said first set is spatially 
separable from every other different overhang-adaptor in the first 
set; 

3 o (c) contacting the said nucleic acid fragments with a nucleic acid ligase 

to cause selective ligation of the nucleic acid fragments with those 
overhang-adaptors whose 5 1 - or 3'- single-stranded ends are fully 
complementary to the 5 - or 3-overhanging single-stranded ends of 
the nucleic acid fragments, 



35 



thus forming a plurality of separable populations of nucleic acid 
fragments which are ligated at their first ends to a first overhang- 
adaptor; 
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optionally, removing the unligated nucleic acid fragments; 

(d) identifying the sequence of the second overhanging single-stranded 
end of the nucleic acid fragments; and 

5 

(e) comparing the sequences of the ends of the nucleic acid fragments 
in order to produce a map of the target nucleic acid molecule. 

2. A method as claimed in claim 1 wherein the target nucleic acid molecule is 
10 a DNA molecule. 

3. A method as claimed in claim 1 or claim 2 wherein the restriction 
endonuclease is a Type Ip or Type lis restriction endonuclease. 

15 4. A method as claimed in any one of the previous claims wherein the target 
nucleic acid molecule is treated with more than one restriction endonuclease, 
wherein the restriction endonucleases either all produce 5 '-overhanging ends 
or all produce 3'-overhanging ends. 

2 0 5. A method as claimed in any one of the previous claims wherein the 

overhang adaptor of the first set are attached or capable of being attached to a 
solid support. 

6. A method as claimed in any one of the previous claims wherein the ligation 

2 5 reaction in step (c) is carried out in free solution. 

7. A method as claimed in any one of the previous claims wherein step (d) is 
carried out by: 

3 o (dl) optionally releasing each population of ligated nucleic acid 

fragments from the solid support, 

selectively contacting each population of nucleic acid fragments 
which were ligated at their first ends to a first overhang-adaptor 
3 5 with a second set of overhang-adaptors, 

each overhang-adaptor of the second set comprising a nucleic acid 
molecule comprising at least one 5 - or 3-single-stranded end, 
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the single-stranded ends of the overhang-adaptors of the second set 
being of lengths and orientations (i.e. 5'- or 3 1 -) corresponding to 
the lengths and orientations of the overhanging single-strands of the 
cleavage sites of the said restriction endonucleases, 

wherein said second set comprises a collection of overhang- 
adaptors whose single-stranded ends collectively encode up to all 
possible permutations and combinations of the nucleotides A, C, G 
and T, 

and wherein each overhang-adaptor in the said second set is 
spatially distinguishable from every other different overhang- 
adaptor in the second set; 

contacting the nucleic acid fragments with a nucleic acid ligase to 
cause selective ligation of the nucleic acid fragments with those 
overhang-adaptors of the second set whose 5'- or 3 - single-stranded 
ends are fully complementary to the second 5 - or 3 ! -overhanging 
ends of the nucleic acid fragments; 

thus forming a plurality of populations of nucleic acid fragments 
which are ligated at their second ends to a second overhang- 
adaptor, and 

2 5 optionally removing the non-ligated nucleic acid fragments; 

(d3) identifying the sequences of the first and second overhanging ends 
of each of the nucleic acid fragments from the spatial positions of 
the second overhang-adaptors to which the nucleic acid fragments 

3 0 are ligated. 

8. A method as claimed in claim 7 wherein steps (b)-(d2) are carried out 
essentially simultaneously. 

3 5 9. A method for detecting overhangs on a microarray address, the method 
comprising the steps of: 

providing one or more single-stranded nucleic acid adaptors each 



15 (d2) 



20 
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comprising a first part and a second part, the first and second parts 
being contiguous with one another, the first part having a free 5 - or 
3*-end; 

5 wherein the adaptor is preferably bound to a solid support; 

contacting the adaptor with a target nucleic acid molecule having a 
single-stranded overhang which is complementary with the first part 
of the adaptor; 

10 

ligating the first part of the adaptor to the single-stranded overhang 
of the target nucleic acid molecule; 

contacting the second part of the adaptor with one or more labelled 
1 5 single-stranded nucleic acid probes having a nucleotide sequence 

which is complementary with the second part of the adaptor; 

ligating the labelled single-stranded nucleic acid probe to the target 
nucleic acid molecule; 

20 

optionally removing any unligated labelled single-stranded nucleic 
acid probe and/or unligated nucleic acid molecule; 

determining whether any target nucleic acid molecule has been 
2 5 ligated to the first part of the adaptor by determining whether any 

labelled probe is bound to the second part of the adaptor. 

10. A method as claimed in claim 7, wherein the spatial positions of the 
second overhang adaptors are determined using the method claimed in claim 

30 8. 

1 1. A method as claimed in any one of claims 1 to 6, wherein step (d) is 
carried out by: 

35 (dl) optionally releasing each population of ligated nucleic acid 
fragments from the solid support, 

selectively contacting each population of nucleic acid fragments 
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which are or were ligated at their first ends to a first overhang- 
adaptor with a second set of overhang-adaptors, 

each overhang-adaptor of the second set comprising a nucleic acid 
5 molecule comprising at least one 5 - or 3-single-stranded end, 

the single-stranded ends of the overhang-adaptors of the second set 
being of lengths and orientations (i.e. 5'- or 3'-) corresponding to 
the lengths and orientations of the overhanging single-strands of the 
10 cleavage sites of the said restriction endonucleases, 

wherein said second set comprises a collection of overhang- 
adaptors whose single-stranded ends collectively encode up to all 
possible permutations and combinations of the nucleotides A, C, G 
15 andT, 

and wherein each different overhang-adaptor in the second set is 
bound to an individual tag; 

20 (d2) contacting the nucleic acid fragments with a nucleic acid ligase to 
cause selective ligation of the nucleic acid fragments with those 
overhang-adaptors of the second set whose 5 - or 3- single-stranded 
ends are fully complementary to the second 5 - or 3'-overhanging 
ends of the nucleic acid fragments; 

25 

thus forming a plurality of populations of nucleic acid fragments 
which are ligated at their second ends to a tagged second overhang- 
adaptor, and 

3 o optionally removing the unligated nucleic acid fragments; 

(d3) identifying the sequences of the first and second overhanging ends 
of each of the nucleic acid fragments from the tags which are bound 
to the second overhang-adaptors. 

35 

12. A method as claimed in claim 11, wherein the tag is a DNA molecule 

13. A method as claimed in any one of claims 1 to 6, wherein step (d) 
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comprises: 

(dl) optionally releasing each population of ligated nucleic acid 
fragments from the solid support, 

5 

selectively contacting each population of nucleic acid fragments 
which are or were ligated at their first ends to a first overhang- 
adaptor with a second set of overhang-adaptors, 

1 o each overhang-adaptor of the second set comprising a nucleic acid 

molecule comprising at least one 5 - or 3-single-stranded end, 

the single-stranded ends of the overhang-adaptors of the second set 
being of lengths and orientations (i.e. 5 - or 3-) corresponding to 
15 the lengths and orientations of the overhanging single-strands of the 

cleavage sites of the said restriction endonucleases, 

wherein said second set comprises a collection of overhang- 
adaptors whose single-stranded ends collectively encode up to all 

2 o possible permutations and combinations of the nucleotides A, C, G 

and T, 

wherein each different overhang-adaptor in the second set is bound 
to an individual tag; 



25 



30 



wherein the tag comprises a plurality of hybridisation sequences, 
each hybridisation sequence being representative of one or more of 
the nucleotides in the second overhanging end of the nucleic acid 
fragment; 



(d2) contacting the nucleic acid fragments with a nucleic acid ligase to 
cause selective ligation of the nucleic acid fragments with those 
overhang-adaptors of the second set whose 5 - or 3 - single-stranded 
ends are fully complementary to the second 5 - or 3-overhanging 
3 5 ends of the nucleic acid fragments; 

thus forming a plurality of populations of nucleic acid fragments 
which are ligated at their second ends to a tagged second overhang- 
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adaptor, and 

optionally, removing the unligated nucleic acid fragments; 

5 (d3) contacting the tagged populations of nucleic acid fragments with a 
set of labelled probes, each set of labelled probes comprising at 
least one probe which is capable of binding specifically to at least 
one of the hybridisation sequences; 

1 o (d4) identifying which labelled probe has bound to the hybridisation 
sequence and identifying the spatial position of the bound probe; 

(d5) removing the labelled probe from the hybridisation sequence; and 

15 (d6) repeating steps (d3)-(d4), and optionally (d5), until the sequence of 
the overhang of the second end of the nucleic acid fragment has 
been determined. 

14. A method for mapping a nucleic acid molecule comprising the steps of: 

20 

(A) treating the nucleic acid molecule with a first set of Type lis 

restriction endonucleases to produce one or more nucleic acid 
fragments, each of the restriction endonucleases in the first set 
producing different overhanging single-stranded ends to the other 
25 restriction endonucleases in the first set, 

and determining the sequences of the overhanging ends of the 
nucleic acid fragments produced thereby; 

3 0 (B) treating the nucleic acid molecule with a second set of Type lis 

restriction endonucleases to produce one or more nucleic acid 
fragments, the second set comprising at least one Type lis 
restriction endonuclease which was not used in step (A) but which 
has a cleavage site which is the same as one or more of the Type lis 

35 restriction endonucleases used in step (A); 

and determining the sequences of the overhanging ends of the 
nucleic acid fragments produced thereby; 
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(C) optionally treating the nucleic acid molecule with one or more 
further sets of Type lis restriction endonucleases to produce one or 
more nucleic acid fragments, 

5 and determining the sequences of the overhanging i nds of the 

nucleic acid fragments produced thereby; 

(D) treating the nucleic acid molecule simultaneously with the Type lis 
restriction endonucleases from all of the sets to produce one or 

1 o more nucleic acid fragments, 

and determining the sequences of the overhanging ends of the 
nucleic acid fragments produced thereby; 

1 5 (E) producing a map of the nucleic acid molecule by using the 

information derived from steps (A)-(D) . 

15: A method as claimed in claim 14 wherein the nucleic acid molecule is a 
DNA molecule. 

20 

1 16. A method of mapping a target nucleic acid molecule, the method 
comprising the steps of: 

(a) treating the target nucleic acid molecule with one or more 

2 5 restriction endonucleases to produce one or more nucleic acid 

fragments having first and second 5 - or 3- single-stranded 
overhanging ends, 

(b) adding the nucleic acid fragments to a first set of overhang- 

3 0 adaptors, 

each overhang-adaptor of the first set comprising a nucleic acid 
molecule comprising at least one 5 - or 3-single-stranded end, 

3 5 the single-stranded ends of the first overhang-adaptors being of 

lengths and orientations (i.e. 5 - or 3-) corresponding to the lengths 
and orientations of the overhanging single-strands of the cleavage 
sites of the said restriction endonucleases, 
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wherein said first set comprises a collection of overhang-adaptors 
whose single-stranded ends collectively encode up to all possible 
permutations and combinations of the nucleotides A, C, G and T 
at all positions in the single-stranded ends except one or more 
5 positions, the latter positions being taken by universal nucleotides, 

and wherein each overhang-adaptor in the said first set is spatially 
separable from every other different overhang-adaptor in the first 
set; 

contacting the said nucleic acid fragments with a nucleic acid ligase 
to cause selective ligation of the nucleic acid fragments with those 
overhang-adaptors whose 5 - or 3 - single-stranded ends are 
complementary to the 5 - or 3 -overhanging single-stranded ends of 
the nucleic acid fragments, 

thus forming a plurality of separable populations of nucleic acid 
fragments which are ligated at their first ends to a first overhang- 
adaptor, and then 

20 

optionally, removing any nucleic acid fragments which are not 
ligated to first overhang-adaptors; 

releasing the nucleic acid fragments which are bound at their first 
ends with a restriction endonuclease which creates a new first 
overhanging single-stranded end in the nucleic acid fragment which 
comprises the nucleotide or nucleotides in the nucleic acid 
fragments which corresponded to the universal nucleotides; 

30 (dl) selectively contacting each released population of nucleic acid 
fragments with a second set of overhang-adaptors, 

each overhang-adaptor of the second set comprising a nucleic acid 
molecule comprising at least one 5- or 3-single-stranded end, 

35 

the single-stranded ends of the overhang-adaptors of the second set 
being of lengths and orientations (i.e. 5 - or 3 -) corresponding to 
the lengths and orientations of the overhanging single-strands of the 



10 

(cl) 



15 



(c2) 

25 
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cleavage sites of the said restriction endonuclease, 

wherein said second set comprises a collection of overhang- 
adaptors whose single-stranded ends collectively encode up to all 
possible permutations and combinations of the nucleotides A, C, G 
and T, 

and wherein each overhang-adaptor in the said second set is 
spatially distinguishable from every other different overhang- 
adaptor in the second set; 

contacting the nucleic acid fragments with a nucleic acid ligase to 
cause selective ligation of the nucleic acid fragments with those 
overhang-adaptors of the second set whose 5 - or 3 - single-stranded 
ends are fully complementary to the second 5 - or 3-overhanging 
ends of the nucleic acid fragments; 

thus forming a plurality of populations of nucleic acid fragments 
which are ligated at their second ends to a second overhang- 
adaptor, and then 

optionally, removing any unbound nucleic acid fragments; 

contacting the ligated nucleic acid fragments with labelled-adaptors 
which bind selectively to the new first overhanging end on the basis 
of the nucleotide or nucleotides in the new first overhanging end of 
the nucleic acid fragments which corresponded to the universal 
nucleotides; 

identifying the sequences of the first and second overhanging ends 
of each of the nucleic acid fragments from the spatial positions of 
the second overhang-adaptors to which the nucleic acid fragments 
are ligated, and from the labels which are attached to the first ends 
of the nucleic acid fragments; and 

comparing the sequences of the ends of the nucleic acid fragments 
in order to produce a map of the target nucleic acid molecule. 
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17. A method as claimed in claim 16, wherein universal nucleotides are 
present at one or two positions in the single-stranded ends of the first 
overhang-adaptors . 

5 18. A method as claimed in claim 16, wherein in step (a), the target nucleic 
acid molecule is treated with one or more Type Ip or lis restriction 
endonucleases. 



19. A method of sequencing a target nucleic acid molecule comprising the 
10 steps of: 

(i) ligating the target nucleic acid molecule with a linker nucleic acid, 

the linker nucleic acid comprising a recognition site for a Type Ip 
or Type lis restriction endonuclease which will cleave the target 
nucleic acid molecule; 

15 (ii) treating the target nucleic acid molecule with a Type Ip or Type Us 

restriction endonuclease to produce one or more nucleic acid 
fragments having single-stranded overhanging ends; 
(iii) ligating one or more of the target nucleic acid fragments with a set 

of labelled adaptors which specifically recognise one or more of the 

20 nucleotides in the single-stranded overhanging ends of the nucleic 

acid fragments, wherein the labelled adaptors comprise a 
recognition site for a Type Ip or Type lis restriction endonuclease 
which will cleave the target nucleic acid molecule at a position one 
or more nucleotides 5 - or 3 - to the first cleavage site; 

25 (iv) identifying which labelled adaptors have bound to the nucleic acid 

fragments, thus providing information on the nucleotide sequence 
of at least part of the overhanging ends of the target nucleic acid 
fragment; 

(v) optionally, repeating steps (ii)-(iv) one or more times. 



30 



20. A method as claimed in claim 19, wherein the target nucleic acid 
molecule is a DNA molecule. 
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